perm filename NEW.PUB[2,TES] blob
sn#035507 filedate 1973-04-08 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00028 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 .COMMENT THE NEW DOCUMENT SYSTEM
C00004 00003 .GROUP SKIP 5
C00006 00004 .SEC BLOCK DIAGRAM
C00008 00005 .SEC THE MANUSCRIPT
C00010 00006 .SS TEXT EXPRESSIONS
C00012 00007 .SS GLYPHS
C00014 00008 .SS THE DEVICE SPECIFICATION
C00016 00009 .SS THE FORMATTER
C00018 00010 .SEC GALLEYS
C00020 00011 .SS THE PAGINATOR
C00022 00012 .SS THE POLISHER AND THE DOCUMENT
C00023 00013 .SS THE PRINTER/VIEWER
C00025 00014 .SEC THE REGISTRY
C00027 00015 .SS GLYPH FILES
C00029 00016 .SEC THE MLISP EXTENSION
C00031 00017 .SS ADVANTAGES AND DISADVANTAGES OF MLISP
C00034 00018 .SEC STANDARDS
C00035 00019 .SEC REALIZATION
C00038 00020 .SEC APPENDICES
C00039 00021 .SEC SOME THOUGHTS ON STANDARD CHARACTER REPRESENTATION
C00049 00022 .SEC MATHEMATICAL NOTATION
C00055 00023 .SS MATHEMATICS -- IMPLEMENTATION NOTES
C00060 00024 .SS FONT INFORMATION STORAGE.
C00071 00025 .SEC PROPOSAL FOR GRAPHICS LANGUAGE
C00074 00026 .SEC FIGURE 1
C00078 00027 .SEC FIGURE 2
C00079 00028 .FILL
C00080 ENDMK
C⊗;
.COMMENT THE NEW DOCUMENT SYSTEM ;
.TURN ON "{"
.NOJUST
.REQUIRE "PUBMAC.DFS[1,3]" SOURCE_FILE
.STANDARD FRONT("I", "!-A")
.EVEN HEADING({SECNAME},,{DATE})
.ODD HEADING({DATE},,{SECNAME})
.EVERY FOOTING(,{PAGE!})
.GROUP SKIP 5
.BEGIN CENTER
A PROPOSAL FOR THE NEW DOCUMENT SYSTEM
Larry Tesler, Brian Harvey, Lester Earnest,
Tovar Mock, and Robert Sproull
.END
.SKIP 4
The new system has two main purposes:
(1) To provide a means for flexible production of medium-quality
documents such as technical reports, manuals, theses, and books
which may include text, line drawings, half-tone images, and
mathematical symbolism.
(2) To provide a standard representation for such documents that
can be printed or displayed on various kinds of output devices
by various kinds of computers with reasonable results.
The proposed participants in development of the new system are
Stanford University, Carnegie-Mellon University, and Xerox Palo
Alto Research Center.
This proposal was prepared by the Palo Alto Committee, consisting
of Stanford and Xerox
people. The Pittsburgh committee at CMU is concurrently preparing
its own proposal. The two proposals shall be exchanged as well
as submitted to other interested parties for comment, criticism,
and reconciliation.
.SEC BLOCK DIAGRAM
A block diagram of the proposed system is shown in
Figure 1. Dash-boxes represent computer files; plus-boxes
represent visible copy; starred boxes represent programs.
The system starts with a "scribble" in an author's head or
on paper. Using a conventional TEXT EDITOR, the author
prepares a "manuscript" file encoded in a PUB-like language.
The manuscript is fed to the FORMATTER program which produces
a "galley proof". The galley may be printed (or displayed)
by a PRINTER/VIEWER program to be proofread by the author for
errors. To correct errors, changes are made to the manuscript
and the FORMATTER is run again.
Once an acceptable galley proof is obtained, it is fed to the
PAGINATOR and POLISHER programs which produce a "document"
file. This file may be printed (or displayed) by the PRINTER/
VIEWER program. Again, if errors are discovered, corrections
must be made in the manuscript and the cycle repeated.
Auxiliary programs and files that appear in the block diagram
will be explained in subsequent sections.
.SEC THE MANUSCRIPT
The manuscript contains sufficient information for the
system to compute the document without human intervention.
Thus, the system is basically non-interactive. However,
this does not preclude provision for optional interaction
at appropriate points for debugging and advising purposes.
The manuscript is actually a computer program in the yet
unnamed language P. P is similar to PUB except that PUB
is an augmented subset of SAIL while P is an extension
of MLISP. The complete facilities of MLISP are available
to the author, including variables, arrays, for-statements,
recursion, list structures, function declarations, and
interaction.
Among the extensions to MLISP in P are "text expressions",
"math expressions", "calligraphic expressions", "image
expressions", "portion declarations", "area declarations",
and "group declarations".
.SS TEXT EXPRESSIONS
Text expressions are equivalent to "paragraphs" in PUB.
Every text expression has a "class", which may be specified
in the manuscript explicitly by name or implicitly by
form (cf. "AT n" in PUB). Associated with each class are
formatting procedures. Examples of classes might be "prose",
"quotation", "table", "heading", and "Algolprogram".
A text expression is composed of "words" and each word is
composed of "virtual glyphs" (formerly called "characters").
An example of a virtual glyph (or "virgle") is "Small Seriph
Italic Upright Black Alpha". A "Glyph Map" fed to the system
along with the manuscript maps virgles into "actual glyphs"
or "augles". For example, the glyph map may say that
"Small" is "8 point", "Seriph" is "Elzevir", and "Alpha" is
"Greek 101". Or it may map all sizes into one, all fonts
into LPTFONT, and all glyph-sets into ASCII characters.
The glyph map is conceptually an n-dimensional sparse array
of functions. For example, "Large Seriph Italic A" may be
specified as appearing explicitly in a certain glyph file
or may be specified as a scale-reduction applied to an oversize
glyph.
.SS GLYPHS
Among the n coordinates that define a glyph are:
(1) Code. An integer between 40 and 172 octal selecting a
particular character out of a character set.
(2) Set. A set of up to 91 characters, e.g., Greek Alphabet,
Math symbols 1, Accents.
(3) Case. Upper, Lower. Differs only for letters in alphabets.
(4) Style. Light, Bold, Italic, Bold Italic, Demibold, etc.
(5) Font. Caslon, Elzevir, Times Roman, Lptfont, Datadiscfont,
JohnDoefont.
(6) Size. Measured in Points. The P language has point-pica-inch
conversion primitives.
(7) Orientation. Upright or some other angle between 0 and 360
degrees.
(8) Thickness.
(9) Texture.
(10) Color.
.SS THE DEVICE SPECIFICATION
A "Device Specification" file must be fed to the system along
with the manuscript and the Glyph Map. Conceptually, the
Device Specification defines a printing or viewing device
as a set of attributes such as RASTERSCAN, 200PPI, 2FONTS,
NOGRAYSCALE. Actually, the file is a collection of MLISP
DEFPROPs and procedures through which the FORMATTER,
PAGINATOR, and POLISHER programs filter the manuscript to
obtain a document that can be processed by the PRINTER/VIEWER
program for the specified device.
Keeping such procedures on a separate file (usually in LAP
form for efficiency) keeps the kernel system small even when
new devices are added to its capability.
The PRINTER/VIEWER program and the Device Specification File
are provided by each installation for each of its devices.
It may be possible in some cases for an installation to
use a single P/V and Device Spec for several devices. In such
a case, a single document file could be printable on all of them.
.SS THE FORMATTER
The FORMATTER program is similar to the PARSER and FILLER
modules of PUB. The PARSER is replaced by the MLISP compiler
and the LISP system. The FILLER is replaced by modules for
text, math, line-drawings, and images. The pagination
capabilities of PUB are intentionally omitted to simplify
the FORMATTER and to allow more complex capabilities to be
handled by the PAGINATOR program.
During operation of the FORMATTER, the author can monitor
its progress on a terminal, interrupt it at landmark points,
and interact with it at breakpoints and error points.
The FORMATTER may generate tables of contents, indices, etc.
in manuscript format as in PUB. If it does, it swaps in an
ALPHABETIZER program to sort the indices. Then the FORMATTER
is swapped back in to process the generated portions.
A hyphenation capability is included in the text module
for those who like it.
The manuscript is structured into one or more portions,
each of which may be divided into sections. Non-global
declarations are local to portions and to sections (unlike PUB).
Thus, it is possible to format sections independently, but care
must be taken if there are interactions (e.g., figure numbering
that does not start over at 1 in each section).
.SEC GALLEYS
The FORMATTER outputs two files called the "galley" and the "galley
guide" (analogous to the PUInS.PUI and the PUIn.PUI files of PUB).
The galley contains text, drawing directives, and image directives,
with sufficient information so that the Printer/Viewer program
can display it provisionally justified but not paginated. There
is a single column for each section. Footnotes and diagrams appear
close after the text which references them. Cross-references are not
resolved.
The galley guide is an abstract of the galley in which content is
omitted, size information is elaborated, and pagination directives
are carried forward. The galley guide contains sufficient information
for the PAGINATOR program to lay out the document into pages, areas,
boxes, and columns.
.SS THE PAGINATOR
The PAGINATOR Program does not input the galley but only the
galley guide. It essentially juggles rectangles and possibly
other shapes to fit them into pages, areas, and columns,
keeping groups together, placing footnotes below their
referents, and keeping figures near the texts that describe
them.
The PAGINATOR needs to know device specifications but nothing
about glyphs. It also needs to know the author's pagination
directives from the manuscript. These can all be found in the
galley guide.
The principal output of the PAGINATOR is the "Paginated Galley
Guide". This is probably in the same format as the Galley Guide,
but its content is sorted, structured, and pruned.
Whenever the PAGINATOR completes a page, it writes all cross-
reference labels that appeared on that page onto a file called
the "Cross-Reference Table" (CRT? -- no, XRT!).
.SS THE POLISHER AND THE DOCUMENT
Some Printer/Viewer programs may have the sophistication to be
able to input the galley, the paginated guide, and the XRT and
display a finished document (see dotted line in Figure 1).
However, the normal procedure is to feed them to the POLISHER
program which produces a well-ordered "document" file in which
pages are together and cross-references are resolved. This
file is easily handled by the P/V.
.SS THE PRINTER/VIEWER
This device-dependent program can print either the galley or
the polished document, becuase both files are in the same
format.
For raster devices, the P/V may have two passes. One
generates bit matrices from vector/text representations, while
the other actually prints the matrices.
The P/V program may be parametric at the option of the installation.
In certain cases, it may be possible to substitute certain fonts for
others, to change the resolution specification, or to select certain
pages for output.
The P/V is the only program that looks at the actual images
of glyphs. These glyphs are in a form appropriate to the
device, e.g., octal code, bit matrix, vector outline. The
actual image is normally computed from a contour representation
extracted from the Registry.
.SEC THE REGISTRY
There is a Network Registry of Glyphs as well as local
registries. A document referring to a local registry
can not be transmitted over the Network. Use of local
registries should be limited to storing new glyphs that
have not had an opportunity to be registered in the
Network Registry.
The Registry consists of a Glossary and a Directory.
The Glossary lists the available Sets, Cases,
Styles, Fonts, and so forth. There is a procedure
for adding new entries to the Glossary, e.g., the
Russian alphabet to the Set Glossary or Clarendon to
the Font Glossary. It is also possible to add new
characters to existing incomplete sets.
The Directory lists every Glyph File registered by
a participating installation, including its coordinates
in the sparse array, complete file name, and site name.
The coordinates must be use the terminology of the
Glossary.
It is not permissible to change a glyph file once it has
been registered in the Directory.
.SS GLYPH FILES
Each Network Glyph File defines up to 91 glyphs. The file header
contains geometric information needed by the FORMATTER and
POLISHER programs, such as height, width, kerning profiles,
and transformation clues for changing scale, orientation,
and thickness. The remainder of the file contains a curved
contour representation of each glyph.
Each local installation is expected to have its own GLYPH
CONVERTER to generate local glyph files (see Figure 2).
The headers are simply copied from Network Glyph Files,
possibly changing scale, orientation, and thickness. The
contours are converted to bit matrices or vector outlines
as appropriate.
In the case of trivial devices such as line printers, trivial
glyph files should be produced by the installation. However,
it is important to stay within the framework of the registry.
For example, if the LPT has an integral sign, it should be
specified in the glyph map as, say, "math-set 63" rather than
as "latin-set 14". The local math-set glyph file would then
specify that glyph 63 is really octal 14 on the LPT. Other
glyphs in the local math-set file would have no good
representation on the LPT.
.SEC THE MLISP EXTENSION
Several simple changes to MLISP will be made:
(1) Contraction. Some features that would be useless to the
system and to most authors will be removed in the interest of
saving space. Authors needing these features could LAP them in.
(2) Macros. The MLISP "DEFINE" only replaces one token by another.
Macros in P must be able to replace either an identifier or a
sequence of delimiters by an arbitrary sequence of tokens.
Invisible tokens such as spaces, tabs, and line boundaries must
be recognized as tokens in text expressions of P.
(3) Strings. The LISP string facilities are different in every
system and inadequate in all. P will have its own string package
with a few primitives to be encoded in LAP for each object machine.
A string will be a series of glyphs; thus, the package would compute
widths and heights of text units such as words at high speed.
.SS ADVANTAGES AND DISADVANTAGES OF MLISP
Among the advantages of an MLISP implementation of the new
system are:
(1) Efficiency. The language will be processed by an extension
of the existing MLISP compiler, which translates at 3000
lines per minute, more than three times faster than PUB
Pass One. Most PUB macros could be procedures (EXPRs and
FEXPRs) in P, so their execution will be several times faster
than in PUB (PUB spends much of its time expanding macros).
(2) Flexibility. Author procedures could directly call
or redefine procedures in the system. During debugging,
the author could set breakpoints and perform traces.
(3) Portability. The extended MLISP compiler will be written
mostly in STANDARD LISP, so that it will be transportable
to new installations with a minimum of effort.
The system should run equally well (except for speed differences) in
LISP1.6, TENEX-LISP, ILSP, MACLISP, and LISP70. With a small amount
of LAP programming, it should run in LISPs on other computers than
the PDP-10 as well.
Disadvantages of MLISP are:
(1) Size. The LISP1.6 version of the FORMATTER will probably be
nearly as large as PUB Pass One, becase of LISP and MLISP overhead.
This will be remedied when LISP70 is operational.
(2) Inefficiency. The PAGINATOR and POLISHER may be simple enough to
be programmed in machine language at a substantial gain in efficiency.
This may be done after portable LISP versions are operational.
.SEC STANDARDS
The following file formats shall be standardized:
.BEGIN VERBATIM
(1) Individual Documents
a. Manuscript.
b. Galley and Document (same format).
c. Cross-Reference Table.
d. Galley Guide.
e. Paginated Galley Guide (similar to d?).
(2) Registry
a. Glossary
b. Directory
c. Glyph File Header
d. Curved Contour Representation
.END
The following programs shall be written in portable fashion:
(1) FORMATTER
(2) PAGINATOR
(3) POLISHER
.SEC REALIZATION
Manuscript and Registry standards shall be proposed by
Palo Alto and Galley and Document standards by Pittsburgh.
The FORMATTER shall be programmed by Rich Johnson and
Brian Harvey with assistance by Larry Tesler.
The PAGINATOR and POLISHER shall be programmed at CMU.
MLISP extensions shall be made at Stanford.
The ILSP implementation will be maintained by CMU, the
LISP1.6 (and later LISP70) implementations by Stanford,
and the TENEX-LISP implementation by Xerox.
Each installation shall provide its own glyph converters,
text editors, device specifications, and printer/viewers.
However, the possibility of collaborating on XGP service
should be explored as the project proceeds. CMU shall
be the motivating force and shall do most of the programming.
A target date of August 15 is suggested for a first version
of the system. Although only a subset will be implemented
in the first version, the framework for supplying the
remainder must be provided.
This optimistic estimate is based on the fact that PUB
was completed in six months by one person in an
inappropriate language. The new implementation is simplified
by separating pagination from filling and by building on
an existing compiler. Although the new system has many
sophisticated facilities, they have all been done before in
some form by some of the implementors.
.SEC APPENDICES
Included for completeness are memos by Dan Swinehart on
the registry, by Brian Harvey on math, by Bob Sproull
on graphics. Unfortunately, Sproull's document is
not machine-readable, so only an abstract appears here.
It should be noted that these documents were prepared
before the above committee report. Therefore many
points have been incorporated into the report or
rejected by the committee.
.SEC SOME THOUGHTS ON STANDARD CHARACTER REPRESENTATION
.TURN ON "∪↓_" BEGIN NOJUST NOFILL RETAIN
D. Swinehart -- 30 March, 1973
Reference: CHARAC.PRO[ESS,JMC] -- also a NIC document, don't know #
.END
.SS Registry Character Representation
1. Assume an arbitrary sized addressing space, say 100 bits (large enough).
The ∪Registry (official character specification for all recognized characters)
is therefore going to be sparse, and will have to be represented in some
complex structure. Each entry in the Registry is a character description,
expressed in some accurate way.
2. Any set of entries which are logically related (a "↓_character_set"_↓)
will want to occupy consecutive locations in the address space -- high
order bits identify which character set ("font").
3. Any character set which represents a font of the "standard" character
set (96-char ASCII or whatever), or would have some reason to want to
map onto that character set (e.g., Greek, Cyrillic, some other script)
should be arranged so that each entry's low order 7 bits (say) is the
ASCII for the character it represents. Or something like that.
4. Some number of high order bits, perhaps leaving several above the
standard 7 or 8 for expansion of the basic set, can be officially
designated ↓_font_bits_↓, applying to character sets as described in
(3). Others could be designated "nofont" bits, representing specific
unrelated graphics with no direct ASCII mappings.
5. Additional bits, I guess, could be designated to scaling, rotating,
and slanting fields, for those character-machine combinations where
graphics must be hand tuned when new size or distortion characteristics
are introduced. It would be better, I think, if these fields were left
out of the Registry, and introduced into specific machine-dependent
representations, since some implementations will be able to compute
scaled and tilted characters from their normal specification.
6. If a field designated to a purpose (font, etc.) overflows, additional
unused bits can be assigned to extend it -- they need not be contiguous.
(Larry Tesler's more structured suggestions, assigning to each character
a set of property-value attributes, avoids some of this -- I've stayed
pretty primitive for reasons I don't entirely understand).
7. The first few character addresses (say 0001 - 0111) will not be assigned
any graphics. They are reserved as special control characters (see below).
↓_Translation_Specifications_↓ (optional)
1. A given installation may decide that certain of the high order bits
are much more common than others. To get compact file representations,
they would like to shuffle things so that these bits reside just "above"
the basic character-set bits.
2. To do this, each file can specify (at its beginning or in additional
attributes) a translation rule taking ↓_normalized_characters_↓ (see
below) in the file to registry characters. A text file coming over the
net might be translated twice (from remote translation to registry,
from registry to local translation) before being stored. The translation
rule specifies the largest character size (in bits), MAXCH, which
an untranslated (file) character will attain.
3. Each installation can specify a default translation.
.SS Installation Implementation
1. There is no mention in the above of a standard byte size or the
equivalent. The installation is free to choose any byte size it wishes,
as long as it is >4 (or so). However, 7 bits is about minimum
for a reasonable representation.
2. A character representation is simply enough bytes to represent the
largest normalized character. We'll call this number ∪n, where ∪n=MAXCH/bytesize.
Part of the Translation specification's job is to distribute parts of
the registry character representations such that reasonable things fall
into reasonable bytes.
3. Nobody wants to treble or quadruple the size of his file just to get all
these features, so we want a way to distribute parts of characters which
will remain constant over large segments of a file. The following special
characters (which will be recognized no matter what the prefix) will
be interpreted as commands:
.begin nofill nojust retain
1 -- ∪prefix -- the next byte is a byte count, ∪b. The byte after
that is a byte index, ∪i. The next ∪b bytes will replace the
∪i-∪b+1th to ∪ith bytes of the current prefix.
2 -- ∪charsize -- the next byte contains the size, ∪c, in bytes, of
each subsequent file character. A ↓_normalized_character_↓
is then usually obtained by concatenating the current prefix
to the next ∪c bytes in the file. There should be a system
standard prefix, with ∪i=∪n-1, ∪b=∪n-1, ∪c=1.
3 -- ∪escapeset -- the next byte is ∪m, the size of an escape
character -- the default ∪m is ∪n.
4 -- ∪escape -- the next ∪m bytes form a full specification for the
desired normalized character.
.end
A registry character can then be formed from each normalized one, or
the device-dependent character specifications can just be stored
in normalized form.
Often people will want to override a prefix for some period of time, then
return to a previous setting. This nesting can be provided by commands
at this level, or left to higher levels (like PUB).
At a given site, there will be processors (compilers, assemblers) which,
at least at first, will not want to handle the full generality of this
design. If the design were adopted, these processors would have to be
modified just a bit. They should be able to get away with simply recognizing
and ignoring all the control commands, including prefixes, treating all
characters as if they were standard font Ascii.
.SEC MATHEMATICAL NOTATION
by Brian Harvey
.TABBREAK
The linear typein of mathematical displays requires
a wide variety of commands to be accepted, for different
formatting operations, e.g., subscript. This variety seems
to me to preclude the use of single- or double-character
commands; instead, word commands like SUB for subscript
should be used. This means that some escape convention
must be provided to make PUB or its successor distinguish
command words from text.
At Composition Technology, we had two notations for
handling this problem. Individual command words were
preceded by an escape character (we used @), and for cases
like mathematics where many commands would be used in a row,
a line starting with a tab was considered to be all commands.
The latter notation is clearly inappropriate for PUB, but
some character sequence analogous to curly brackets could be used to
bracket math-style commands. It would probably be a good idea
to define a single-command escape like @ as well.
By convention, a one-letter "command" is taken at CTI
to mean that the letter should be printed in italic. This
works out very nicely for math, because most variables are
normally italicized. Thus, to print
.BEGIN NOFILL INDENT 10
iπ
e + 1 = 0
.END CONTINUE
a CTI typist would type
.BEGIN NOFILL CENTER
display e sup i pi base+1=0 dpyend
.END CONTINUE
(Within a display, spaces are typed only to separate command words
and are otherwise ignored. The spacing of the display is controlled
by the computer.)
Most display formatting commands come in bracketing pairs,
like display...dpyend and sup...base in the example above. This
notation is somewhat more verbose than necessary, but has the
advantage that inner operations can be closed automatically by an
outer operation's terminator provided that the two operations are
of different types; also, ample error and warning messages are
possible. For certain formats with relatively simple contents,
macros with a simpler format can be defined. For example, to get
a case fraction like the 1/2 on a typewriter, the canonical syntax
is "case 1 csden 2 csend"; however, a standard macro "cfract (1/2)"
is provided.
Some examples of formatting operators besides those already
mentioned are FAB for "function abbreviation" as in fab(cos)
(abbreviations like cos for fab(cos) would be standardly provided),
div...den...divend for a stacked fraction ("den" is for denominator),
coef...coden...coend for binomial coefficient, barovr...barend and
barudr...barend, plus some having to do with more global formatting
like dpyno...dnoend for a display number to be printed at the margin.
A more complicated problem is a matrix, which would include row and
column operators to separate and position the cells. I have a
complete list of the operations used at CTI, but it seems pointless
to include it in a document like this.
Typesetting mathematics also requires decisions to be made
about the representation of characters in different fonts, etc.
This problem is addressed below.
.SS MATHEMATICS -- IMPLEMENTATION NOTES
Unfortunately, it seems unlikely that the mathematics
processor can be written completely independently of the text
processor. For one thing, mathematical equations are sometimes
found within a line of text (this situation is hereafter called
a DIT for "display-in-text"), and the text line might have to be
broken within the equation. Therefore the text processor needs
not just an "atomic" string representing the equation, but a
good deal of break-precedence information within the dit. Also,
a great deal of low-level code could be shared by all sections of
the program; for example, the math part and the graphics part both
need to draw vectors. This means that the entire program will
have to be collectively designed in some detail before people can
go off and do their part.
One question which must be answered is the degree of
sophistication required in handling formatting problems. For example,
when parentheses are to be used around some tall expression like a
stacked fraction, there are several ways the size of the parens can
be determined:
1. There can be only two sizes, regular and big (say, 10 and 20 point),
and the user can type (...) or obgpar...cbgpar as desired.
2. There can be an explicit size operator, say "size(#) (" where #
is the desired size in suitable units.
3. The program can recursively typeset the stuff inside the parens
and then go back and figure out how big to make the parens.
Of course, one can also imagine some combination of these with manual
override to an automatic calculation, etc. The advantages of #3
should be obvious. The disadvantages include the rather intricate
recursion problems (remember that parentheses are sometimes
unbalanced, so in the general case a backtracking procedure is
needed!), the difficulty of scaling characters on raster hardware,
and the slowing down of an already slow program. Also, as a matter
of aesthetics, the quantization of paren sizes is not obvious.
Infinitely variable height to match the stuff inside would make
each set of parens look funny by comparison with a possibly
slightly different set nearby.
Another formatting decision concerns the question of line
breaking within a long display. The ultimate thing would be for
the program to decide where to break the line (a function of
mathematical meaning and distance into the line) and also how to
align the two parts vertically. At the opposite extreme both
line breaking and alignment could be manually controlled. If the
interactive editor with TV display is really going to happen, the
latter possibility is not as bad as it sounds.
.SS FONT INFORMATION STORAGE.
(Editor's Note: This plan was revised extensively
by the Palo Alto Committee. See main proposal.)
In my opinion the proposed master font registry
should not be the source of font information for
production programs. Instead, there should be a Font
Information File, in a standardized format, to
describe those characters actually used for a
particular job. This file need not contain the
actual character generation information, but
merely certain dimensional information and a
device-specific pointer to find the character for
output. One reason for this is that in the interests
of efficiency programs shouldn't have to dig through
an immense file each time a job is run. (These FIFs
would undoubtedly survive many typesetting runs.)
Another is that people with limited hardware, e.g.,
a line printer, shouldn't have to register the
nonexistent generational data for their one and only
font in order to be able to use the system; instead,
they need only generate a trivial FIF. Even for
non-trivial devices, if we have, e.g., a raster device
and the registry standard is oriented to vector
devices, we can have our own FIF pointing directly to
a raster description of the char rather than having to
generate it each time from the vector description.
In the CTI system, the full name of a character
has four parts: font, style, overlay, and char code.
The font code represents things like Times or Garamond;
the style code is for italic or boldface, the overlay
indicates a set of chars like greek, math, or accents;
and the char code is a 7-bit code to determine the
exact character. The full name of a character is 18
bits long, but in files, a condensed notation is used:
the font and style are globally set by escape code
sequences like @I for italic, overlay 1 chars (more or
less the same as ASCII) are simply represented by their
7-bit code, and other chars are represented by two bytes,
one for the overlay and one for the char. This uses
the low, non-printing ASCII codes for overlay codes (and
for overlay-independent chars like fixed spaces), which
makes for a problem at Stanford, but it's not as bad
as it might be since, as will be seen shortly, people
never actually type overlay codes.
The font code is not tied to an "absolute" font
like Times!!!!! Instead, font 1 is, say, a serif font,
font 2 is script, and font 3 sans-serif. This way,
files may contain commands like @SANSER for sans-serif,
and to change fonts all you have to do is use a
different FIF, which controls the relative to absolute
font conversion. In fact, in theory none of the char
name is absolute in this sense--systematic errors in
interpreting a manuscript character can be corrected by
producing a nonstandard font file in which the bad name
is changed to indicate the good character.
Another sort of name posessed by a character is
its mnemonic. This allows the typesetting of non-keyboard
characters by saying @INT for integral, etc. At CTI the
mnemonic represents only the overlay and char codes, so
one can have light and bold INTs. (This is especially
important for accents, which come in all fonts and styles.)
This is what prevents users from having to type overlay
codes explicitly. Thus in Stanford's case it would be
possible to accept a Stanford-ASCII source file and put out
a real-ASCII-compatible output.
The information in a FIF must include, for each char,
its 18-bit name, its mnemonic, its "absolute" name and/or a
pointer into the registry, a pointer to the device-specific
generation data, its point size, width, height above and
below the baseline, accent class and math class (see below),
kerning profile (ditto), and possibly a see-below modifications
field. If the FIF is stored in binary for efficient use,
there should also be a standard ASCII representation and
translators should be written.
The math class of a character is a number representing
its mathematical meaning, e.g., binary operator, integral sign,
open or close fence (like parens). The accent class, for a
non-accent, distinguishes between lower case letters, undotted
i or j, and everything else. It also may indicate that the
char should be treated as italic even though it isn't, as for
certain greek letters which take italic accents. (Phi does,
pi doesn't.) For an accent, it indicates which of the above
types the accent goes with, and also whether it is an above
accent (circumflex), a below accent (cedilla), or a superposed
accent (slash or bar as through an h for h/2π).
The kerning profile has to do with the problem of
putting characters next to each other. Generally a character
can be considered as a rectangle which is partly inked in,
and the char's dimensions are those of the rectangle. Consider
an italic lower case f, in a word like "food." The top of the
f must overlap the box containing the o in order not to have too
much space between the letters. Generally this is done by
understating the width of the f in the FIF. Now suppose we want
an italic f followed by a close paren. The problem will be that
the top of the f will overlap the paren. To avoid this the
program must know something about the shape of the characters,
but preferably not as much as a full description because the
computation of each kerning problem would then be incredibly
expensive. At CTI we divide each rectangle vertically into 6
pieces of equal height (the height thus depends on the total
char height), and for each compute a 3-bit representation of the
extent to which the ink extends past (or doesn't fill) the box
at each end. This adds up to 6*2*3=36 bits of profile info.
There is also a kerning problem in the other direction, as to
typeset L**2. The little 2 should really be somewhat inside
the L's box.
The character modification feature was developed at CTI
because our hardware made new character generation difficult and
costly. Sometimes we had problems which could be solved merely
by repositioning or changing the scale of an existing char; for
example, to get a center asterisk like * from one in superscript
(footnote) position, one simply puts a vertical drop before the
asterisk and a corresponding rise after it. Compound chars like ≤
can also be made this way, treating one part as a superposed accent
positioned over the other. In a situation like ours where it is
easier to make new chars, this might not be required, but on the
other hand it might be easier for a user than finding Tovar. The
idea is to have the FIF entry describe the "target" graphic, and
not until the last stage does the computer discover that it has
to modify the "source" graphic. This feature, if used, would best
be allowed to be nonstandard in its details so that individual
installations could provide those facilities present in the hardware.
.SEC PROPOSAL FOR GRAPHICS LANGUAGE
by Robert Sproull
This is an editor's abstract of a typewritten document.
A document is composed of "boxes" with geometry, marked where
page breaks can occur. Each box has a "body" and "i.d. info".
The body has printing rules. The i.d. has names for
subtitling and positioning relative to other boxes,.
Processing within each box is independent, allowing for
incremental compilation of a document.
LISP procedures are more useful than macros, e.g., to specify
line drawings in the graphics section.
Line-drawing primitives are suggested: absolute/relative point/line,
line or curve with thickness and texture, string (caption),
device-dependent code.
Floating-point coordinate system chosen by user.
Curves in terms of endpoints and control points. Latter not
necessarily on the curve, but guide fitter.
Program must be able to interrogate the state, including questions
like "How many inches would a vector of length dx,dy occupy?".
Other questions: resolution, string dimensions, aspect ratio.
A display procedure (cf. Newman, CACM) has arguments, prog variables,
and also a "master rectangle" within which it can draw. A display
procedure call may optionally specify the instance rectangle, as well
as location, rotation, scale, and transform matrix. The
system automatically applies these transformations from the user's
coordinate system to the page.
Display procedure calls draw within a "box" of given size as
mentioned earlier.
.SEC FIGURE 1
.VERBATIM
BLOCK DIAGRAM -- PART 1 OF 2
+++++++++++
| SCRIBBLE |
+++++++++++
|
∨
***********
|TEXT EDITOR|
***********
|
∨
-----------
| |
| MANUSCRIPT|
| |
-----------
|
|<-------------------------
+++++++++ ∨ |
| | *********** ************ |
| MONITOR |<--| FORMATTER |--->|ALPHABETIZER|--
| | *********** ************
+++++++++ | |
∨ ∨
------------ ----------
| | | |
|GALLEY GUIDE| | GALLEY |
| | | |
------------ ----------
.NEXT PAGE
BLOCK DIAGRAM -- PART 2 OF 2
------------ ----------
| | | |
|GALLEY GUIDE| | GALLEY |-----
| | | | |
------------ ---------- |
| |
∨ |
*********** |
| PAGINATOR | |
*********** |
| | |
∨ ∨ |
----------- ----------- |
| PAGINATED | | CROSS | |
| GALLEY | | REFERENCE | |
| GUIDE | | TABLE | |
----------- ----------- |
| | -----
| | | |
∨ ∨ ∨ |
--------------------- |
| . |
∨ . |
*********** . |
| POLISHER | . |
*********** . |
| . |
∨ . |
----------- . | +++++++++
| | ∨ ∨ ********* |HARD COPY|
| DOCUMENT |------------------| PRINTER |--->| OR |
| | | /VIEWER | | DISPLAY |
----------- ********* +++++++++
.SEC FIGURE 2
GLYPH CONVERTER
----------
| |
| REGISTRY |
| |
----------
|
∨
***********
| CONVERTER |
***********
| |
∨ ∨
------------ ---------
| GLYPH | | GLYPH |
|DESCRIPTIONS| | IMAGES |
------------ ---------
.FILL
.STANDARD BACK